{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83d\udcc3 Solution for Exercise M1.02\n",
    "\n",
    "The goal of this exercise is to fit a similar model as in the previous\n",
    "notebook to get familiar with manipulating scikit-learn objects and in\n",
    "particular the `.fit/.predict/.score` API."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load the adult census dataset with only numerical variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "adult_census = pd.read_csv(\"../datasets/adult-census-numeric.csv\")\n",
    "data = adult_census.drop(columns=\"class\")\n",
    "target = adult_census[\"class\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous notebook we used `model = KNeighborsClassifier()`. All\n",
    "scikit-learn models can be created without arguments. This is convenient\n",
    "because it means that you don't need to understand the full details of a model\n",
    "before starting to use it.\n",
    "\n",
    "One of the `KNeighborsClassifier` parameters is `n_neighbors`. It controls the\n",
    "number of neighbors we are going to use to make a prediction for a new data\n",
    "point.\n",
    "\n",
    "What is the default value of the `n_neighbors` parameter?\n",
    "\n",
    "**Hint**: Look at the documentation on the [scikit-learn\n",
    "website](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)\n",
    "or directly access the description inside your notebook by running the\n",
    "following cell. This opens a pager pointing to the documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "\n",
    "KNeighborsClassifier?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "source": [
    "We can see that the default value for `n_neighbors` is 5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a `KNeighborsClassifier` model with `n_neighbors=50`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "model = KNeighborsClassifier(n_neighbors=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fit this model on the data and target loaded above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "model.fit(data, target)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use your model to make predictions on the first 10 data points inside the\n",
    "data. Do they match the actual target values?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "first_data_values = data.iloc[:10]\n",
    "first_predictions = model.predict(first_data_values)\n",
    "first_predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "first_target_values = target.iloc[:10]\n",
    "first_target_values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "outputs": [],
   "source": [
    "number_of_correct_predictions = (\n",
    "    first_predictions == first_target_values\n",
    ").sum()\n",
    "number_of_predictions = len(first_predictions)\n",
    "print(\n",
    "    f\"{number_of_correct_predictions}/{number_of_predictions} \"\n",
    "    \"of predictions are correct\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute the accuracy on the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "model.score(data, target)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now load the test data from `\"../datasets/adult-census-numeric-test.csv\"` and\n",
    "compute the accuracy on the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "adult_census_test = pd.read_csv(\"../datasets/adult-census-numeric-test.csv\")\n",
    "\n",
    "data_test = adult_census_test.drop(columns=\"class\")\n",
    "target_test = adult_census_test[\"class\"]\n",
    "\n",
    "model.score(data_test, target_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "solution"
    ]
   },
   "source": [
    "Looking at the previous notebook, the accuracy seems slightly higher with\n",
    "`n_neighbors=50` than with `n_neighbors=5` (the default value)."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}